With the advent of Wi-Fi and local area networks, devices like Real-Time Location Systems (RTLS) can be leveraged for location positioning of an object in a specified area in real time. This is made possible by way of a continuous communication and feedback between a device held by the object being tracked and the beacons or receiver by the host. Examples of RTLSās include Infrared, Bluetooth, Cellular, and Radio Frequency Identification (RFID). To operate RTLS, a scanning device is required to locate an object (such as a cell phone or laptop) based on the angle and coordinates of the object being tracked. You can use multiple scanning devices to triangulate the exact location of the object using the combination of angles in which the signal is triggering the respective receivers. Utilizing a series of wireless network signals in an office building, we will be able to detect the exact location of various objects in real time using different combinations of mac devices. Here, we can perform an unweighted and weighted k-nearest neighbors (k-NN) analysis to predict the location of the Online data using the offline data. To go further, we will be seeing if we can better predict the location of the online data using different combinations of the mac addresses available.
For this project, we will be leveraging two separate datasets for our analysis. One of which is a reference set named āofflineā which contains signal strength measurements from a hand-held device on a gridwork of 166 different points, all of which were spaced 1 meter apart. This gridwork is located in the hallways of a one floor building at the University of Mannheim. The other dataset is titled āonlineā which we will be using for testing our k-NN model to predict the location. This dataset includes 60 different locations chosen at random with 110 signals measured from them across each point. In figure 1.1 below, you can see a map of the āonlineā test locations (black dots) overlaid with the āofflineā training locations (grey dots). Both datasets contain the same features and will require the same procedures for cleaning.
knitr::include_graphics("CleverZonkedElk.png")Office layout: The floor plan of our experimentation environment. This makes up a 1-floor building in the University of Mannheim with the offline (grey dots) points placed about a meter apart from one another throughout the building. In addition, you can see the placement of the online points scattered throughout for testing. Wifi access points are denoted by black squares. Training data is denoted at grey dots. Test data is denoted at black dots.
pander::pander(
list(
t = "Time stamp (Milliseconds) since 12:00am, January 1, 1970",
Id = "router MAC address",
Pos = "Router location",
Degree = "Orientation of scanning device carried by the researcher, measured in Degrees",
MAC = "MAC address of either the router, or scanning device, combined with corresponding values for signal strength (dBm), the mode in which it was operating(adhoc scanner = 1, access router = 3), and its corresponding channel frequency.",Signal = "Received Signal Strength in DbM")
)The first cleaning method we will employ will be to break up the position variable into separate variables which we can use to triangulate the location. In our raw dataset, we have position values for latitude, longitude and elevation separated by commas, which we will convert into PosX, PosY and PosZ. Upon further cleaning, we were able to determine that there was only one unique value for PosZ at 0 (which made sense considering the experiment took place in a one story building), we had the liberty to drop the variable. Additionally, running a procedure to check on the number of unique variables in the ScanMac column yielded only a single unique value, so we can drop that one as well.
In our documentation, we found that our type of device was mixed between the values 1 and 3, which we may want to clarify more. Reviewing our documentation, we will only want to focus on fixed access points (value=3) as that is more relevant to our study of predicting device locations using a fixed set of receivers. So, moving further, we will remove the adhoc instances in our dataset.
The Time measurement is something that we will want to make an adjustment to so that we can more easily analyze in the future. As mentioned prior, the time data is based on the number of milliseconds from a specific date (which could possibly be arbitrary), so we can change to a Year-Month-Day-Time format. But first, we can divide the number of milliseconds to seconds. This leaves us with the following features that we will use across our offline and online data. Additionally, we will remove the channel feature since it is strictly a character code that contains redundant identifiers of Mac Address, signal strength, frequency and mode that may play an unfair role in our predictive modeling.
#processLine performs splitting and cleansing of delimiters in the lines
processLine = function(x) {
tokens = strsplit(x, "[;=,]")[[1]]
if (length(tokens) == 10) {
return(NULL)
}
tmp = matrix(tokens[-(1:10)], , 4, byrow = TRUE)
cbind(matrix(tokens[c(2, 4, 6:8, 10)], nrow(tmp), 6, byrow = TRUE), tmp)
}
#roundReaderOrientation adjusts angles to increments of 45 degrees
#this is done to simplify calculations overall since the resolution
#of these angles is less important than that which is provided
roundReaderOrientation = function(orientation) {
refs = seq(0, by = 45, length = 9)
angle = sapply(orientation, function(o) which.min(abs(o - refs)))
c(refs[1:8], 0)[angle]
}
# read in the data for the appropriate mac addresses
readData <- function(filename, subMacs = c("00:0f:a3:39:e1:c0", "00:0f:a3:39:dd:cd",
"00:14:bf:b1:97:8a", "00:14:bf:3b:c7:c6",
"00:14:bf:b1:97:90", "00:14:bf:b1:97:8d",
"00:14:bf:b1:97:81")) {
txt = readLines(filename)
lines = txt[substr(txt, 1, 1) != "#"]
tmp = lapply(lines, processLine)
# create dataframe and name columns
data = as.data.frame(do.call(rbind, tmp), stringsAsFactors = FALSE)
names(data) = c("time", "scanMac", "posX", "posY", "posZ",
"orientation", "mac", "signal", "channel",
"type")
# keep only signals from access points (=3)
data = data[data$type == "3", ]
# drop scanMac, posZ, channel, and type - no info in them
dropVars = c("scanMac", "posZ", "channel", "type")
data = data[, !(names(data) %in% dropVars)]
# drop more unwanted access points
data = data[data$mac %in% subMacs, ]
# convert numeric values
numVars = c("time", "posX", "posY", "orientation", "signal")
data[numVars] = lapply(data[numVars], as.numeric)
# convert time to POSIX
data$rawTime = data$time
data$time = data$time/1000
class(data$time) = c("POSIXt", "POSIXct")
# round orientations to nearest 45 degree angle
data$angle = roundReaderOrientation(data$orientation)
return(data)
}offline <- readData("offline.final.trace.txt")
online <- readData("online.final.trace.txt")Reviewing our cleansed data set, we can see the following:
head(offline)It should be noted that the original researchers chose to exclude data from one access point. Of the 7 access points, 2 were Alpha routers. Mac 00:0f:a3:39:e1:c0 was kept for the analysis, and mac 00:0f:a3:39:dd:cd was removed. In this analysis we will endeavor to determine whether this action was warranty.
According to our documentation, we should have only 8 values for orientation. We also simplify our measurement of the reader orientation and round these measurements to the nearest 45 degree angle. The resolution of the existing data is less important, thus we can safely assume that 45 degree increments is sufficient, and this will provide some relief in terms of computational effort.
We will look at the orientation column of our dataset, we can see that we have a wide variety of angles available in clusters around the expected angles (such as 179 or 181) as shown in figure 2.1. Since we are going to focus on measure signal strength at 8 orientations in 45-degree increments, we will round each of our orientations to the nearest 45 degree increment. Additionally, we will try to map values close to 360 so that they line up back to zero.
length(unique(offline$orientation))#> [1] 203
plot(ecdf(offline$orientation), main = 'Orientation for the Hand-Held Device', xlab = 'CDF', ylab = 'Orientation')Orientation for the Hand-Held Device: The location of the orientation values as it relates to the empirical cdf. We can see that at each major orientation (such as 45, 90, 135, 180, ect) we are scattered around these values. So our cleaning procedure is going to round these to the nearest 45 degree angle. After making the adjustment, we can see that the new values look like they are more exact to the 8 angles we are using.
with(offline, boxplot(orientation ~ angle,
xlab = 'Rounded 45 Degree Angle',
ylab = 'Orientation'))